Social Determinants in Diabetes Prevalence and Management
BMIN503/EPID600 Final Project
Author
Karen Tang
1 Overview
The project is to analyze the role of social determinants in diabetes prevalence and managment. The social determinants will be explore in this project are race, income, insurance coverage, and clinic proximity. The goal is to understand how these factors influence the prevalence and management of diabetes, providing insights that can inform public strategies, healthcare policies, and intervention programs.
I spoke to Dr. Richard Tsui about my project, he guided me to choose a specific social determinants that directly correlates to the disease I want to learn more about.
2 Introduction
According to the CDC, in 2020, 38.4 million people in the United States of all ages had diabetes. Diabetes was the eighth leading cause of death in the United States. In an article called “Overview of Social Determinants of Health in the Development of Diabetes” from the Diabetes Journals stated that diabetes has a long-standing, well-documented socioeconomic and racial/ethnic inequalities in disease prevalence and incidence, morbidity and mortality. Higher diabetes prevalence is associated with lower education, lower income, and non-White race/ethnicity.
World Health Organization (WHO) Commission defined Social Determinants of Health (SDOH) as “the conditions in which people are born, grow, live, work and age, and the wider set of forces and systems shaping the conditions of daily life”. SDOH attributed between 30%-55% of health outcome and they viewed as the main driver of avoidable health inequities. Due to the association between social determinants of health and diabetes, I would like to learn and conduct an analysis on the following factors: race, income, and insurance, distance to clinics.
3 Methods
The datasets I used are the following:
1.Diabetes = the dataset is uploaded to this repository under the name of “ExportCSV.csv”, it is state level survey data of the year 2022 from the Behavioral Risk Surveillance System (BRFSS).
2. Social determinants = the dataset is a county level data of Census Tract 2020 from the Social Determinants of Health Database. Since the file is too big to upload here, I filtered the datset by only Pennsylvania and West Virginia and export it as a csv file. The dataset is uploaded to this repository under the name of “PA_WV_social”
3. Geospatial Data from the tigris package in Rstudio.
#Loading the necessary packageslibrary(readxl)library(dplyr)
Attaching package: 'dplyr'
The following objects are masked from 'package:stats':
filter, lag
The following objects are masked from 'package:base':
intersect, setdiff, setequal, union
── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
✖ dplyr::filter() masks stats::filter()
✖ dplyr::lag() masks stats::lag()
ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
library(ggplot2)library(sf)
Linking to GEOS 3.12.2, GDAL 3.9.3, PROJ 9.4.1; sf_use_s2() is TRUE
library(tigris)
To enable caching of data, set `options(tigris_use_cache = TRUE)`
in your R script or .Rprofile.
library(leaflet)library(maps)
Attaching package: 'maps'
The following object is masked from 'package:purrr':
map
Rows: 235 Columns: 28
── Column specification ────────────────────────────────────────────────────────
Delimiter: ","
chr (21): LocationAbbr, LocationDesc, Class, Topic, Indicator, Response, Dat...
dbl (7): ID, Year, Low_Confidence_Limit, High_Confidence_Limit, Sample_Size...
ℹ Use `spec()` to retrieve the full column specification for this data.
ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
#Only choosing dataset for PA and WVPA_WV_diabetes <- diabetes %>%filter(LocationDesc %in%c("Pennsylvania", "West Virginia"))#Having the variable Data_Value as numeric in order to graph/compare laterdiabetes$Data_Value <-as.numeric(diabetes$Data_Value)
Warning: NAs introduced by coercion
#Looking for the state with the highest diabetes diagnosis, excluding U.S. islandsmost_diabetes <- diabetes %>%filter(Response =="Yes", !grepl("median", LocationDesc, ignore.case =TRUE), !LocationDesc %in%c("Guam", "Puerto Rico", "Virgin Islands")) %>%select(LocationDesc, Response, Data_Value) %>%arrange(desc(Data_Value)) %>%head(1)# Print the resultsummary(most_diabetes)
LocationDesc Response Data_Value
Length:1 Length:1 Min. :17.4
Class :character Class :character 1st Qu.:17.4
Mode :character Mode :character Median :17.4
Mean :17.4
3rd Qu.:17.4
Max. :17.4
From the output above, the state with the highest population diagnosed with diabetes is West Virginia.
Showing the percentage of population with diabetes from Pennsylvania and West Virginia
# A tibble: 2 × 3
LocationDesc Response Data_Value
<chr> <chr> <chr>
1 Pennsylvania Yes 11.5
2 West Virginia Yes 17.4
Providing a visualization on the percentage of population with diabetes based on all states in the United States
#Diabetes = yes data diabetes_yes <- diabetes %>%filter(Response =="Yes", !grepl("median", LocationDesc, ignore.case =TRUE), !LocationDesc %in%c("Guam", "Puerto Rico", "Virgin Islands"))#Download counties data on every states in U.S.counties1 <-counties(cb =TRUE, class ="sf")
#Joining diabetes and counties to mapdiabetes_map <-inner_join(diabetes_yes, counties1, by =c("LocationAbbr"="STUSPS"))#Making the diabetes as sf data to mapdiabetes_map <-st_as_sf(diabetes_map)#Create color palette based on the percentage of diabetes pal <-colorNumeric(palette ="YlOrRd",domain = diabetes_map$Data_Value)# Create a popup showing diabetes datamap_data <- diabetes_map %>%mutate(popup_info =paste0("<b>State:</b> ", LocationAbbr, "<br>","<b>White:</b> ", Data_Value, "<br>" ))# Generate the mapdiabetes_by_state <-leaflet(data = map_data) %>%addTiles() %>%addPolygons(fillColor =~pal(Data_Value), fillOpacity =0.5,color ="black",weight =1,popup =~popup_info ) %>%fitBounds(lng1 =min(st_bbox(diabetes_map)$xmin), # Minimum longitudelat1 =min(st_bbox(diabetes_map)$ymin), # Minimum latitudelng2 =max(st_bbox(diabetes_map)$xmax), # Maximum longitudelat2 =max(st_bbox(diabetes_map)$ymax) # Maximum latitude )
Warning: sf layer has inconsistent datum (+proj=longlat +datum=NAD83 +no_defs).
Need '+proj=longlat +datum=WGS84'